Oct4


This Project

This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein.

Background Information

Oct4 (Octamer-binding transcription factor 4 or POU5F1) plays an important role in embryonic stem cells - cells that have the ability to develop into any other cell in the body (Figure 1). As a transcription factor, Oct4 binds to DNA to help control which genes are expressed (making RNA and/or proteins) in embryonic stem cells. In fact, Oct4 is sometimes referred to as a 'master regulator' of embryonic stem cells as it helps turn stem cells genes 'on' and differentiation genes 'off'.
Figure 1. Stem cells. Embryonic stem cells (ESCs) are pluripotent - meaning that they are capable of differentiating into many other cell types. ESCs can both self-renew (maintain their pluripotent state) or differentiate into other cells types. Image from Wikimedia Commons.
Figure 1. Stem cells. Embryonic stem cells (ESCs) are pluripotent - meaning that they are capable of differentiating into many other cell types. ESCs can both self-renew (maintain their pluripotent state) or differentiate into other cells types. Image from Wikimedia Commons.










Figure 1. Stem cells. Embryonic stem cells (ESCs) are pluripotent - meaning that they are capable of differentiating into many other cell types. ESCs can both self-renew (maintain their pluripotent state) or differentiate into other cells types. Image from Wikimedia Commons.

Oct4 heterodimerizes with another transcription factor, Sox2 (Sex determining region Y 2) and together they control genes important for the stem cell fate (Rizzino 2013 & Boyer et al. 2005). Too much or too little Oct4 protein within an ESC leads to differentiation and loss of pluripotency (Niwa et al. 2000 & Chambers et. al. 2003). Oct4 plays such a powerful role in the control of stem cells that forced expression of Oct4 in other cell types can cause those cells to de-differentiate into pluripotent stem cells (Takahashi & Yamanaka 2006). Inappropriate expression of Oct4 has also been shown to lead to dedifferentation of cells into a cancer stem cell like state (Kumar et al., 2012).

Here, we were unable to identify a strong potential Oct4 ortholog from the newly sequenced whale shark genome, but did identify likely other POU domain transcription factors.

Methods

Whale shark predicted orthologs
The human protein sequence (ENSP0000047512) was used as query in a Blast against the predicted whale shark protein database using the whaleshark.georgiaaquarium.org Galaxy server. Top predicted protein hits were then used as querries (using the full predicted sequence not only the aligned sequence) in protein BLASTs against the NCBI human protein database. The whale shark predicted protein database was also searched using the elephant shark predicted Oct4 protein sequence as query.
Predicted orthologs
Oct4 predicted orthologs were identified in species other than whale sharks using the NCBI Blast server. Protein BLASTs were performed using single species protein databases for mouse, zebra fish, clawed frogs, fruit flies, yeast and elephant shark protein databases. The human Oct4 protein (ENSP0000047512) was used as query sequence in these searches with default settings.
Phylogenetic tree
The hit with the lowest E-value for each non-whale shark species search (using the human protein as query)
along with the top 4 whale shark BLAST hits were used to create a multiple sequence alignment and phylogenetic tree. ClustalW2 with default settings was used to create the alignment and tree.

Searching for Oct4 in the whale shark

The human Oct4 protein sequence was used to query the whale shark predicted protein database and results are shown in Table 2. There were 4 hits with E-values below 1e-31 with the next smallest E-value being 6e-15. These 4 best hits were then Blasted against the human protein database using NCBI BLASTp.
ID
Length
Top hit name
Top hit E-value
g23191.t1
385
POU domain, class 3, TF 4
3e-147
g44106.t1
380
POU domain, class 3, TF 2
1e-171
g44634.t1
419
POU domain, class 3, TF 4
1e-163
g48496.t1
363
POU domain, class 3, TF4
4e-137
Table 2. Human Oct4 best BLASTp best hits against the whale shark predicted protein database. The Galaxy server was used to query the predicted whale shark protein database using the human Oct4 preotein sequence. The top 4 hits according to E-value are reported here with their database ID and amino acid length. These sequences were also used as querries against the NCBI human protein database. The top hit (according to lowest E-value) name and E-value are reported here.

As none of the predicted proteins from the Blast of the whale shark proteins using the human Oct4 sequence as query returned Oct4 as the best hit against the human protein database, we were not confident that any of these four were Oct4 orthologs. We then repeated this process using the Elephant Shark predicted Oct4 protein as query against the whale shark predicted protein database to see if a more closely related species would return different best hits. There were 4 hits with E-values below 1e-64 (the lowest had an E-value of 4e-68) and the next lowest E-value was 4e-35. The top 4 hits exactly matched the top 4 hits from the search using the human protein as query.

Protein domains

Potential Oct4-like proteins in the whale shark (from BLAST results) all contain both a POU-domain and a homeodomain (Figure 2). POU-domains are named for three POU-domain proteins - Pit1, Oct1/2, and Unc86. POU domains are DNA-binding domains and are always accompanied by a homeodomain. POU domain proteins are known to play roles in control of animal development (Philips and Luisi 2000). Homeodomains are DNA-binding domains approximately 60 amino acids long (Scott et al., 1989). Homeodomain proteins can control gene expression of the dna they bind to and are highly involved in controlling developmental processes (Gehring 1992).
Untitled.jpg
Figure 2. Putative domains of whale shark Oct4 best hit predicted proteins. All of the four best-hit whale shark predicted proteins contain putative POU and homeo domains as predicted by NCBI BLAST server analyses.

Orthologs

The human Oct4 protein sequence (ENSP0000047512) was used as query in NCBI BLAST searches against individual species' protein databases. Pou domain, class 5, TF1 orthologues were found using this method in mice, zebra fish and elephant sharks. Frogs, flies, plants and yeast all returned homeodomain proteins but not necessarily Oct4 orthologues although Oct4 protein has been identified in at least african clawed frogs (Morrison & Brickman 2006,). Pou domain proteins underwent expansion during vertebrate development so it is likely that Oct4-specific orthologs may not be found in non-vertebrates although ancestral POU domain proteins are present (Zhang et al., 2013).
Species
Name
ID
Length
E-value
Homo sapiens
Pou domain, class 5, TF1
NP_001272915.1
164
NA
Mouse
Pou domain, class 5, TF 1
NP_001239381.1
221
1e-101
Zebra fish
Pou domain, class 5, TF 1
NP_571187.1
472
2e-33
Elephant shark
predicted POU domain, class 5, TF1.1-like
XP_007898544.1
470
2e-35
African clawed frog
Pou class 5 homeobox 3, gene 1
NP_001272403.1
445
5e-37
Fruit fly
Ventral veins lacking
NP_523948.1
427
1e-26
Arabidopsis
Homeodomain-like superfamily protein
NP_192234.2
507
6e-5
Yeast
Yox1p
NP_013685.1
385
2e-4
Table 1. Best hits with human Oct4 protein BLAST. The human Oct4 sequence was used in protein BLASTs against individual species. Name, ID, length and E-value of the best hit from each search is reported here. Highlighted entries return Oct4 as the best hit when used as query in a BLAST against the human protein database.

Phylogeny

The best hits from protein database searches using the human Oct4 protein as query were used to create a phylogenetic tree. From this tree, it is clear that the four whale shark predicted proteins share a high degree of similarity as they group together (Figure 3). The fact that these whale shark proteins then group with the fly ventral veins lacking protein (one of 3 POU domain transcription factors in flies) rather than the elephant shark predicted Oct4 protein is surprising. This possibly suggests that whale sharks may have several early POU domain transcription factors that do not exactly correspond to Oct4 and other more specialized POU domain transcription factors found in higher vertebrates but rather a more ancestral general POU domain protein.
Untitled copy.jpg
Figure 3. Phylogenetic tree of Oct4 best hits. The best hit from BLAST searches of protein databases (or the best 4 hits in the case of whale sharks) were used in the ClustalW2 program to create a phylogenetic tree. Branch lengths represent relative evolutionary time.

Conclusions

While we were not able to strongly identify a predicted Oct4 ortholog in whale sharks, we were able to find evidence for several POU domain transcription factors. POU transcription factors are known to have undergone expansion during vertebrate evolution and as sharks represent an early-splitting branch in the vertebrate phylogenetic tree (Figure 4) it is entirely possible that whale sharks do not have a true Oct4 orthologue but rather an ancestral form of POU proteins. The predicted Oct4 ortholog in the elephant shark does suggest that an Oct4 ortholog may exist in whale sharks but as the elephant shark predicted Oct4 ortholog has high sequence similarity to human Oct4 and other POU domain proteins (with Oct4 not even being the best E-value hit) more research should be done to explore the presence and annotations of POU domain proteins in sharks. This is especially important given the critical role Oct4 plays in pluripotent cells and the enormous medical interest in and potential uses of stem cells.
File:Fish evolution.png
File:Fish evolution.png

Figure 4. Phylogenetic tree of vertebrates. Image courtesy of Epipelagic.

References

Boyer L, Lee T., Cole M., Johnstone S. Levine S., Zucker J., Guenther M., Kumar R., Murray H., Jenner R., Gifford D., Melton D., Jaenisch R., and Young R. (2005). Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122(6):947-56.

Chambers I., Colby D., Robertson M., Nichols J., Lee S., Tweedie S., and Smith A. (2003). Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell 113:643-55.

Gehring W. (1992). The homeobox in perspective. Trends Biochem. Sci. 17(8):277-80.

Kumar SM., Liu S., Lu H., Zhang H., Zhang PJ., Gimotty PA., Guerra M., Guo W., and Xu X. (2012). Acquired cancer stem cell phenotypes through Oct4-mediated dedifferentiation. Oncogene 31(47):4898-11.

Morrison G., and Brickman J. (2006). Conserved roles for Oct4 homologues in maintaining multipotency during early vertebrate development. Development 133(10):2011-22.

Niwa H, Miyazaki J, and Smith A. (2000). Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells. Nat. Genet. 24 (4): 372–6.

Phillips K., and Luisi B. (2000). The virtuoso of versatility: POU proteins that flex to fit. J. Mol. Biol. 302(5):1023-9.

Rizzino A. (2013). Concise review: The Sox2-Oct4 connection: critical players in a much larger interdependent network integrated at multiple levels. Stem Cells 31(6):1033-9.

Scott M., Tamkun J., and Hartzell G. (1989). The structure and function of the homeodomain. Biochimica Et Biophysica Acta 989 (1): 25–48.

Takahashi K, and Yamanaka S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126:663-76.

Zhang X., Ma Y., Liu X., Zhou Q., and Wang X. (2013). Evolutionary and functional analysis of the key pluripotency factor Oct4 and its family proteins. J. Gen. & Genomics 40(8):399-412.